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(57) Abstract 



The invention features nucleic acid molecules and, in particular, DNA molecules having catalytic activity, as well as methods for 
obtaining and using such nucleic acid molecules. 
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CATALYTIC DNA 
Background of the Invention 
This invention relates to DNA molecules having 
5 catalytic activity and methods of obtaining and using 
such DNA molecules. 

Ribozymes are highly structured RNA molecules that 
carry out specific chemical reactions (e.g., cleavage of 
RNA, cleavage of DNA, polymerization of RNA, and 
10 replication of RNA) , often with kinetic efficiencies 
comparable to those of most engineered enzymes. 

Summary of the Invention 
The invention features nucleic acid molecules 
having catalytic activity, as well as methods for 
15 obtaining and using such nucleic acid molecules. 

The methods of the invention entail sequential 
in vitro selection and isolation of nucleic acid 
molecules having the desired properties (e.g., catalytic 
activity, such as ligase activity) from pools of single- 
20 stranded nucleic acid molecules (e.g., DNA, RNA, or 

modifications or combinations thereof) containing random 
sequences. The isolated nucleic acid molecules are then 
amplified by using, e.g., the polymerase chain reaction 
(PCR) . 

2 5 The rounds of selection and amplification may be 

repeated one or more times, after each round, the pool of 
molecules being enriched for those molecules having the 
desired activity. Although the number of desired 
molecules in the initial pool may be exceedingly small, 

3 0 the sequential selection scheme overcomes this problem by 

repeatedly enriching for the desired molecules. 

The pool of single-stranded nucleic acid molecules 
employed in the invention may be referred to as "random 
nucleic acid molecules" or as containing "random 



WO 96/40723 



PCT/US96/09358 



sequences . H These general -terms are used to describe 
molecules or sequences which have one or more regions of 
"fully random sequence." In a fully random sequence, 
there is an approximately equal probability of A, T/U, C, 
5 or 6 being present at each position in the sequence. Of 
course, the limitations of some methods used to create 
nucleic acid molecules make it rather difficult to 
synthesize fully random sequences in which the 
probability of each nucleotide occurring at each position 

10 is absolutely equal. Accordingly, sequences in which the 
probabilities are roughly equal are considered fully 
random sequences. 

In "partially random sequences" and "partially 
randomized sequences," rather than there being a 25% 

15 chance of A, T/U, C, or G being present at each position, 
there are unequal probabilities. For example, in a 
partially random sequence, there may be a 70% chance of A 
being present at a given position and a 10% chance of 
each of T/U, C, or G being present at that position. 

20 Further, the probabilities can be the same or different 
at each position within the partially randomized region. 
Thus, a partially random sequence may include one or more 
positions at which the sequence is fully random, one or 
more positions at which the sequence is partially random, 

25 and/ or one or more positions at which the sequence is 
defined. 

Partially random sequences are particularly useful 
when one wishes to make variants of a known sequence. 
For example, if one knows that a particular 50 nucleotide 

30 sequence possesses a desired catalytic activity and that 
positions 5, 7, 8, and 9 are critical for this activity, 
one could prepare a partially random version of the 
50 nucleotide sequence in which the bases at positions 5, 
7, 8, and 9 are the same as in the catalytically active 

35 sequence, and the other positions are fully randomized. 
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Alternatively, one could prepare a partially random 
sequence in which positions 5, 7, 8, and 9 are partially 
randomized, but with a strong bias towards the bases 
found at each position in the original molecule, with all 
5 of the other positions being fully randomized. This type 
of partially random sequence is desirable in pools of 
molecules from which catalytic nucleic acids are being 
selected. The sequence of any randomized region may be 
further randomized by mutagenesis during one or more 

10 amplification steps. 

In addition to random or partially random 
sequences, it may also be desirable to have one or more 
regions of "defined sequence. w A defined sequence is a 
sequence selected or known by the creator of the 

15 molecule. Defined sequence regions are useful for 

isolating or PCR amplifying the nucleic acid molecule 
because they may be recognized by defined complementary 
primers. The defined sequence regions may flank the 
random regions or be intermingled with the random 

20 regions. The defined regions can be of any length 

desired and are readily designed using knowledge in the 
art (see, for example, Ausubel et al., Current Protocols 
in Molecular Biology . Greene Publishing, New York, New 
York (1994) ; Ehrlich, PCR Technology . Stockton Press, New 

25 York, New York (1989); and Innis et al., PCR Protocols. A 
Guide to Methods and Applications . Academic Press, Inc., 
San Diego, CA (1990)). 

The selection method of the invention involves 
contacting a pool of nucleic acid molecules containing 

30 random sequences with the substrate for the desired 
catalytic activity under conditions (including, e.g., 
nucleic acid molecule concentrations, temperature, pH, 
and salt) which are favorable for the catalytic activity. 
Nucleic acid molecules having the catalytic activity are 

35 partitioned from those which do not, and the partitioned 
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nucleic acid molecules having the catalytic activity then 
are amplified using, e.g., PCR. 

The steps of contacting, partitioning, and 
amplifying may be repeated any desired number of times. 
5 Several cycles of selection (contacting, partitioning, 
and amplifying) may be desirable because after each round 
the pool is more enriched for the desired catalytic 
nucleic acids. One may choose to perform so many cycles 
of selection that no substantial improvement in catalytic 

10 activity is observed upon further selection, or one may 
carry out far fever cycles of selection. 

Methods known in the art may be used at particular 
steps of this selection and isolation procedure, and one 
skilled in the art is referred to Ellington and Szostak, 

15 Nature 346:818-822, 1990; Lorsch and Szostak, Nature 
371:31-36, 1994; Tuerk and Gold, Science 249:505-510, 
1990; and methods described herein. 

In addition, one may mutagenize isolated catalytic 
nucleic acids in order to generate and subsequently 

20 isolate molecules exhibiting improved catalytic activity. 
For example, one may prepare degenerate pools of single- 
stranded nucleic acids based on a particular catalytic 
nucleic acid sequence, or one may first identify 
important regions in a catalytic nucleic acid sequence 

25 (for example, by standard deletion analysis) , and then 
prepare pools of candidate catalytic nucleic acid 
molecules that include degenerate sequences at those 
important regions. 

Those skilled in the art can readily identify 

30 catalytic nucleic acid consensus sequences by sequencing 
a number of catalytic nucleic acid molecules and 
comparing their sequences. In some cases, such 
sequencing and comparison will reveal the presence of a 
number of different conserved sequences. In these 

35 circumstances, one may identify a core sequence which is 



WO 96/40723 



PCT/US96/09358 



common to most: or all of the isolated sequences. This 
core sequence, or variants thereof, may be used as the 
starting point for the selection of improved catalysts. 
By "variant" of a sequence is meant a sequence created by 
5 partially randomizing the sequence. 

The size of the randomized regions employed should 
be adequate to provide a catalytic site. Thus, the 
randomized region used in the initial selection 
preferably includes between 10 and 300 nucleotides, for 

10 example, between 25 and 180 nucleotides. 

It may be desirable to increase the stringency of 
a selection step in order to isolate more molecules. The 
stringency of the selection step may be increased by 
decreasing substrate concentration. The stringency of 

15 the catalysis selection step can be increased by 

decreasing the ligand concentration or the reaction time. 

In one aspect, therefore, the invention features a 
method for obtaining a nucleic acid molecule having 
ligase activity. In the first step of this method, a 

20 population of candidate nucleic acid molecules, each 

having a region of random sequence, is contacted with a 
substrate nucleic acid molecule and an external template. 
The external template is complementary to a portion of 
the 3' region of the substrate nucleic acid molecule and 

25 a portion of the 5' region of each of the candidate 

nucleic acid molecules in the population. Alternatively, 
the external template may be complementary to a portion 
of the 5' region of the substrate nucleic acid molecule 
and a portion of the 3 ' region of each of the candidate 

30 nucleic acid molecules in the population. Binding of the 
external template to the substrate nucleic acid molecule 
and a candidate nucleic acid molecule from the population 
juxtaposes the 3' region of one of the molecules with the 
5' region of the other. 
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One of the terminal nucleotides (either the 5' or 
the 3' nucleotide) of the juxtaposed regions may contain 
an activated group. Activated groups that may be used in 
the method of the invention include , but are not limited 
5 to, 5'-phosphoro(2-methyl) imidazolide, a 5'- 

phosphorimidizolide, cyanogen bromide, and carbodiimides 
(e.g. , l-ethyl-3- (3 '-dimethylaminopropyl) carbodiimide 
(GDI) , l-cyclohexyl-3- (2-morpholinyl- (4) -ethyl) - 
carbodiimide metho-p-toluenesulf onate, CDI-1, and CDI-2) . 

10 As a specific example, the activated group is a 3'- 

phosphorimidazolide on the 3' terminal nucleotide of the 
substrate. Activating groups are added to the nucleic 
acid molecules used in the methods of the invention by 
using methods known in the art. 

15 Alternatively, if desired, this first step 

external templating may be omitted. It is not essential 
to the selection method of the invention. 

In the second step of this method of the 
invention, a subpopulation of nucleic acid molecules 

20 having ligase activity is isolated from the population. 
This may be accomplished by, e.g., affinity 
chromatography followed by selective PCR amplification. 
For example, the substrate nucleic acid and/or the 
nucleic acid from the population may contain the first 

25 member of a specific binding pair (e.g., biotin) . As a 
specific example, the terminal nucleotide of the 
substrate nucleic acid (e.g., the 5' terminal nucleotide 
of the substrate nucleic acid) and/or the nucleic acid 
molecule from the population that is not juxtaposed by 

30 the external template may be labeled with biotin. 
Isolation of molecules containing biotin may be 
accomplished by contacting the molecules with immobilized 
avidin, e.g., a streptavidin agarose affinity column. 
Other specific binding pairs known to one skilled in the 

35 art may be used in the method of the invention. 
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The isolated subpopulation may be amplified in 
vitro using, e.g., PCR. In selective PCR, the first 
primer is complementary to a sequence of the substrate 
nucleic acid molecule and the second primer, is 
5 complementary to the opposite strand of a sequence in the 
population. Use of these primers therefore results in 
amplification of only those nucleic molecules which are a 
product of the ligation of the substrate to a nucleic 
acid molecule from the population. In order to generate 

10 a population of nucleic acid molecules for further rounds 
of selection, nested PCR amplification may be carried out 
using primers which preferably include the terminal 
nucleotides of the nucleic acid from the population that 
was ligated to the substrate nucleic acid. 

15 The above-described steps of contacting, 

isolating, and amplifying may be repeated on the 
subpopulations of nucleic acid molecules obtained. The 
additional rounds of selection may be carried out in the 
presence or absence of the external template. Nucleic 

20 acid molecules isolated using the above-described method 
may be subcloned into a vector (e.g., a plasmid) and 
further characterized by, e.g., sequence analysis. 

In a second aspect, the invention features a DNA 
molecule capable of acting as a catalyst. A catalyst is 

25 a molecule which enables a chemical reaction to proceed 
under different conditions (e.g., at a lower temperature, 
with lower reactant concentrations, or with increased 
kinetics) than otherwise possible. 

In a third aspect, the invention features a DNA 

30 molecule capable of acting as a catalyst on a nucleic 
acid substrate. This catalysis does not require the 
presence of a ribonucleotide in the nucleic acid 
substrate. 

In a fourth aspect, the invention features a 
35 nucleic acid molecule having ligase activity, e.g., DNA 
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or RNA ligase activity. The nucleic acid molecule may be 
DNA, RNA, or combinations or modifications thereof. 

In a fifth aspect, the invention features a 
nucleic acid molecule capable of ligating a first 
5 substrate nucleic acid to a second substrate nucleic 

acid. The rate of ligation catalyzed by the nucleic acid 
molecule of the invention is greater than the rate of 
ligation of the substrate nucleic acids by templating 
under the same reaction conditions which include such 

10 variables as, e.g., substrate concentration, 

template/enzyme concentration, nature and quantity of 
base-pairing interactions between substrates and 
template/ enzyme, type of activating group, salt, pH, and 
temperature. Templating is the joining of two substrate 

15 nucleic acid molecules when hybridized to contiguous 
regions of a "template" nucleic acid strand. 

In a sixth aspect, the invention features a 
catalytic DNA molecule capable of ligating a first 
substrate nucleic acid to a second substrate nucleic 

20 acid. The first substrate nucleic acid contains the 
sequence 3 '-s 1 -S 2 -5 |r , the second substrate nucleic acid 
contains the sequence 3'-S 3 -S 4 -5', and the catalytic DNA 
molecule contains the sequence 5'-E 1 -TTT-E 2 -AGA-E 3 -E 4 -E 5 - 
E 6 -3'. 

25 For these substrate and catalytic DNA molecules, 

S 1 contains at least two (for example, 2-100 , 4-16 , or 8- 
12) nucleotides positioned adjacent to the 3' end of S 2 . 
The S 1 nucleotides are complementary to an equivalent 
number of nucleotides in E 1 that are positioned adjacent 

30 to the 5' end of TTT. 

S 2 contains one - three (for example, 1) 
nucleotides, S 3 contains one - six (for example, 3) 
nucleotides, and the 5' terminal nucleotide of S 2 and the 
3' terminal nucleotide of S 3 alternatively contain an 

35 activated group or a hydroxy 1 group. 
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S 4 contains at least two (for example, 2-100, 4- 
16, or 8-12) nucleotides positioned adjacent to the 5' 
end of S 3 . The S 4 nucleotides are complementary to an 
equivalent number of nucleotides in E 6 that are 
5 positioned adjacent to the 3' end of E 5 . 

E 1 contains at least two (for example, 2-100, 4- 
16, or 8-12) nucleotides positioned adjacent to the 5 9 
end of TTT. The E 1 nucleotides are complementary to an 
equivalent number of nucleotides in S 1 that are 
10 positioned adjacent to the 3' end of S 2 . 

E 2 contains 0-12 nucleotides, for example, 3-4 
nucleotides . 

E 3 contains at least two (for example, 2-100, 3- 
50, 5-20, or 5) nucleotides positioned adjacent to the 3' 

15 end of said AG A, said E 3 nucleotides being complementary 
to an equivalent number of nucleotides in E 5 that are 
positioned adjacent to the 5' end of E 6 . 

E 4 contains at least 3 nucleotides (for example, 
3-200, 3-30, 3-8, 4-6, or 5) nucleotides. Alternatively, 

20 E 4 may contain zero nucleotides. In this case, the 3 9 
end of E 3 and the 5' end of E 5 would not be linked to 
another nucleic acid segment (e.g., E 4 ) , and the enzyme 
therefore would be made up of two separate nucleic acid 
molecules (the first containing S'-E^TTT-E^AGA-E 3 ^ 9 , 

25 and the second containing 5 # -E 5 -E 6 -3') . 

E 5 contains at least two (for example, 2-100, 3- 
50, 5-20, or 5) nucleotides positioned adjacent to the 5 9 
end of E 6 . The E 5 nucleotides are complementary to an 
equivalent number of nucleotides in E 3 that are 

30 positioned adjacent to the 3' end of AG A. 

E 6 contains at least two (for example, 2-100, 4- 
16, or 8-12) nucleotides positioned adjacent to the 3 9 
end of E 5 . The E 6 nucleotides are complementary to an 
equivalent number of nucleotides in S 4 that are 

35 positioned adjacent to the 5' end of S 3 . 
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in the case of long stem structures formed by, 
e.g., S 1 and E 1 , S 4 and E 6 , or E 3 and E 5 , the stem 
structures may contain mismatches , provided that a stem 
structure is maintained. 
5 The 5' most nucleotide of S 2 , the 3' most 

nucleotide of S 3 , and the second 3' most nucleotide of S 3 
may be complementary to the 5' most nucleotide of E 2 , the 
second 5' most nucleotide of E 2 , and the third 5' most 
nucleotide of E 2 , respectively. In addition, E 2 may 

10 contain four nucleotides, and the third 3' most 

nucleotide of S 3 may be complementary to the fourth 5 # 
most nucleotide of E 2 . 

In a seventh aspect, the invention features a 
method of ligating a first nucleic acid molecule to a 

15 second nucleic acid molecule. In this method, the first 
and second nucleic acid molecules are contacted with a 
nucleic acid molecule having ligase activity (e.g., DNA 
ligase activity) . The nucleic acid molecule having 
ligase activity, as well as the first and second nucleic 

20 acid molecules may contain DNA, RNA, or modifications or 
combinations thereof 

The ease with which DNA oligonucleotides can be 
synthesized and their relatively high stability represent 
major advantages over other biopolymer catalysts, such as 

25 proteins and RNA, for, e.g., industrial, research, and 
therapeutic applications. Other features and advantages 
of the invention will be apparent from the following 
description of the preferred embodiments thereof, and 
from the claims. 



30 Detailed Description 

The drawings are first described, 
prawinqs 

Fig. 1 is a schematic representation of the In 
vitro selection strategy used to isolate DNA molecules 
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having DNA ligase activity. Each molecule in the single 
stranded DNA (ssDNA) pool contained 116 random bases 
flanked by constant regions having sequences 
complementary to the PCR primers 5'- 
5 GGAACACTATCCGACTGGCACC-3 ' (SEQ ID NO: 29) and 5'-biotin- 
CGGGATCCTAATGACCAAGG-3 9 (SEQ ID NO: 30). The pool was 
prepared by solid-phase phosphor amidite chemistry and 
amplified by PCR (Ellington et al . , Nature 355:850-852, 
1992) to yield approximately 32 copies of 3.5 x 10 14 

10 different molecules. Single stranded DNA was prepared 
from the amplified pool as described by Bock et al. 
(Nature 355:564-566, 1992). The activated substrate (5'- 
biotin— AAGCATCTAAGCATCTCAAGC-p— Im (SEQ ID NO: 31)) 
contained a S'-biotin group and a 3'-phosphorimidazolide 

15 (Chu et al., Nucleic Acids Res. 14:5591-5603, 1986). 

Eight copies of the DNA pool (0.5 /iM) were incubated in 
selection buffer (30 mM Hepes, pH 7.4, 600 mM KC1, 50 mM 
MgCl 2 # 1 mM ZnCl 2 ) with 1 [M activated substrate and 1 fM 
of an external template ( 5 ' — CGGATAGTGTTCCGCTTGAGATGCTT— 3 ' 

20 (SEQ ID NO: 32)) complementary to the 5 9 end of the pool 
and the 3 9 end of the activated substrate. After a two 
hour incubation, the reaction was stopped by addition of 
EDTA. 0.5% ligated product was present after 24 hr. No 
product formation was observed in the absence of the 

25 external template. At cycle 7, pool activity was 

independent of the external template, indicating that the 
remaining pool molecules were using an internal substrate 
binding site. In cycles 8 and 9, no external template 
was added, and the reaction time was decreased to 2 and 

30 0.5 minutes, respectively, in order to increase selection 
stringency. To isolate ligated molecules, the reacted 
pool was passed through a streptavidin agarose affinity 
column (Pierce, Rockford, IL) , unligated pool was washed 
off the column under denaturing conditions (3 M urea 

35 followed by 150 mM NaOH, 40 column volumes each) , and the 
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ligated pool was specifically eluted with excess free 
biotin (Wilson et al., Nature, in press, 1995). To 
select for substrate ligation to the 5' -hydroxy 1 of the 
pool molecules, isolated DNA was selectively PCR 
5 amplified (in cycles 6-9 only) with a first primer 
corresponding to the substrate sequence and a second 
primer complementary to the 3 # constant region of the 
pool, and gel purified. This pool was then subjected to 
nested PCR with the first set of primers, gel purified, 
10 and re-amplified for ssDNA isolation (Bock et al., Nature 
355:564-566, 1992). Nine cycles of selection- 
amplification were performed, after which the pool 
activity remained constant. 

Pig. 2 A is a denaturing acrylamide gel analysis of 

15 a time course of ligation reactions catalyzed by pool 9 
ssDNA. Internally labeled pool 9 DNA (0.5 /iM) was 
incubated with activated substrate (1 jxM) in selection 
buffer for the indicated times. In a control reaction, 
the substrate was not activated (lane 5) . DNAs were 

20 separated by electrophoresis in a 6% polyacrylamide/8 M 
urea gel. Radioactivity was detected using a Molecular 
Dynamics Phosphor imager . 

Fig. 2B is a schematic representation of the 
sequences of clones isolated from pool 9 DNA. DNA from 

25 pool 9 was amplified by PCR and cloned into pT7Blue T- 
Vector (Novagen, Madison, WI) . Each of the clones 
analyzed was sequenced in both directions using the 
standard dideoxy sequencing method. The 21 sequences 
(SEQ ID NOs: 1-21) shown in the figure share a consensus 

30 sequence consisting of two conserved domains (SEQ ID NOs: 
22 and 23) . Upper and lower case letters in the 
consensus indicate highly and moderately conserved 
positions, respectively. X and Z represent non- 
conserved, but complementary bases. The bolded T in 

35 domain I is present in 50% of the clones. 
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Fig. 3 A is a schema-tic representation of -the 
proposed secondary structure for the consensus sequence 
of the DNA molecules having DNA ligase activity isolated 
from pool 9 DNA, The 5' end of domain I and the 3' end 
5 of domain II base-pair with the 5' constant region of the 
pool(SEQ ID NO: 25 and the activated substrate, (SEQ ID 
NOL: 24) respectively. The two complementary regions 
( M NNNN M of SEQ ID NO: 26 and "NNNN" of SEQ ID NO: 27) 
form a stem structure and bring the flanking domains into 
10 close proximity. Dotted lines indicate possible 

interactions between the bases at the ligation junction 
and the sequence between the two boxed sequences, TTT and 
AG A. 

Fig. 3B is a schematic representation of a minimal 

15 DNA catalyst (SEQ ID NO: 28) . Non-conserved regions in 
the DNA structure shown in Fig. 3A were deleted in order 
to generate a three-fragment complex in which the 
formation of a phosphodiester bond between the 3'- 
phosphorimidazolide substrate SI and the 5' -hydroxy 1 

20 substrate S2 is catalyzed by the 47 nucleotide 
metalloenzyme E47. 

Fig. 3C is a denaturing acrylamide gel analysis of 
a time course of ligation of activated substrate SI and 
radiolabeled substrate S2 by the catalyst E47. No 

25 reaction was detectable when activated SI (lanes 1 and 5) 
or E47 (lane 6) was absent. 

Fig. 3D is a table showing the initial rates of 
ligation catalyzed by E47, E47-3T, E47-AGA, E47-hairpin, 
and pool 9 ssDNA. Activated substrate SI (1 fM) and 

30 radiolabeled S2 (0.5 /xM; S2 was 3 '-end labeled using [a- 
32 P]-cordycepin-5 / -triphosphate (NEN Dupont, Boston, MA) 
and terminal transferase (Promega, Madison, WI)) were 
incubated with the different catalysts (0.75 iM) at 25°C. 
Reaction conditions are as in Fig. 1, with the following 

35 changes: 30 mM Hepes, pH 7.2, and 4 mM ZnCl 2 . DNA was 
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separated by on a 12% polyacrylamide/8 M urea gel. 
values were determined by fitting fraction ligated 
vs. time to a linear equation using KaleidaGraph, and are 
the average of two independent experiments measured at 
5 less than 20% product formation. E47-3T and E47-AGA are 
E47 derivatives in which the conserved TTT and AGA 
sequences are deleted , respectively. E47-hairpin is an 
E47 derivative in which the hairpin has been replaced by 
5'-CCATG-3'. The background reaction , containing an 
10 external template (see Fig. l) , was measured over a six 
hour incubation. No product was detected in the absence 
of the template, corresponding to a maximum rate of 2 x 
10~ 5 hr" 1 . 

Fig. 4 A is a denaturing acrylamide gel analysis of 

15 an experiment showing the effect of Mg 2+ , Zn 2+ , and Cu 2+ 
on catalysis. Reactions were incubated for 20 minutes at 
the indicated divalent metal ion concentrations. No 
reaction was detected in the absence of Zn 2+ and Mg 2 + 
(lane 2) , or with only Mg 2+ (lane 3). Mg 2+ is not 

20 required for activity, and Zn 2+ alone (lane 4) catalyzes 
the reaction with the same efficiency as Zn 2+ and Mg 2+ 
together. Cu 2+ is the only divalent metal found that can 
substitute for Zn 2+ (lane 5) ; it does not require Mg 2+ for 
activity. The rate of ligation is independent of 

25 monovalent metal ions. Potassium chloride can be 
substituted by lithium, sodium chloride, or cesium 
chloride, or removed with no significant effect on 
product formation. 

Pig. 4B is a graph showing the effects of zinc (o) 

30 and copper (t) concentrations on product formation. The 
reaction incubation time was 7 minutes. 

Fig. 4C is a graph showing log(K obs ) versus pH. 
In the presence of 10 /xM CuCl 2 , there is a linear 
correlation between the log of K obs and pH, with a slope 

35 of 0.7 up to pH 6.8. At higher pH values, the activity 
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decreases linearly with a slope of -0.7. A slope close 
to +1 suggests that proton abstraction is involved in the 
rate determining step of the reaction, while a slope of - 
1 is indicative of proton donation (Fersht. Enzyme 
5 Structure and Mechanism (Freeman, New York, 1985) ) . The 
observed rate is independent of buffer concentration 
between 30-150 mM. A similar effect was observed with 
Zn 2+ at 4 mM up to pH 7.4. At higher pH, the activity 
drops drastically, possibly due to the formation of 
10 insoluble metal oxides or hydroxides (Bailar, Jr. et al., 
Comprehensive Inorganic Chemistry (Pergamon Press Ltd., 
1973)). The reaction conditions were as specified in the 
description of Fig. 3. 

Isolation of DNA molecule having DNA liaase activity 

15 Oligodeoxynucleotides can be non-enzymatically 

ligated on either single-stranded (Nay lor et al . , 
Biochemistry 5:2722-2728, 1966) or duplex (Luebke et al., 
J. Am. Chem. Soc. 111:8733-8735, 1989) DNA templates. We 
designed an in vitro selection strategy (Szostak, Trends 

20 Biochem. Sci. 17:89-93, 1992; Chapman et al., Curr. Opin. 
Struct. Biol. 4:618-622, 1994; Breaker et al., Trends 
Biotechnol. 12:268-275, 1994; Joyce, Curr. Opin. Struct. 
Biol. 4:331-336, 1994) in order to determine whether DNA 
sequences which catalyze DNA ligation more efficiently 

25 than non-enzymatic templating could be isolated from a 
large pool of random sequences (Fig. 1) • Using this 
strategy, a small single-stranded DNA that is a Zn 2+ /Cu 2+ - 
dependent metalloenzyme was isolated. The enzyme 
catalyzes the formation of a new phosphodiester bond by 

3 0 the condensation of the 5 9 -hydroxy 1 group of one 

oligodeoxynucleotide and a 3 9 -phosphor imidazolide group 
on another oligodeoxynucleotide, and shows multiple 
turnover ligation. 
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The details of the selection strategy are 
illustrated in Fig. 1. After nine cycles of selection 
and amplification , the DNA pool (pool 9) displayed 
efficient ligation activity (Fig. 2A) . Incubation of 
5 pool 9 DNA with the activated substrate yields a ligated 
product with the correct molecular weight and the 
expected nucleotide sequence at the ligation junction. 
To analyze further the selected sequences , DNA from pool 
9 was cloned and sequenced. The majority of the clones 

10 contain a common consensus sequence consisting of two 
small domains separated by a spacer region of variable 
length and sequence (Fig. 2B) . The two small domains are 
embedded in entirely different flanking sequences , 
indicating that several independent sequences in the 

15 original pool were carried through the selection process. 
Inspection of the consensus sequence suggests a secondary 
structure that is more complex than a simple template, 
but nevertheless brings the 5' -hydroxy 1 group and the 3'- 
phosphorimidazolide group into close proximity (Fig. 3A) . 

20 Based on the consensus sequence, a small 47 nt 

ssDNA catalyst (E47) was designed that ligates two 
separate DNA substrates , Si and S2 (Fig. 3B) . Incubation 
of radiolabeled S2 with activated substrate Si and E47 
catalyst results in the appearance of the expected 

25 ligated product (Fig. 3C) . Product formation requires 
that all three components are present in the reaction. 
In addition, the 3 '-phosphate group of SI must be 
activated. E47 catalyzes the ligation reaction twice as 
fast as pool 9. Small deletions within E47 result in 

30 severe losses of catalytic efficiency (Fig. 3D), 
indicating that the central consensus sequence is 
necessary for catalysis. The initial rate of ligation of 
SI and S2 by E47 is 3400-fold greater than the rate of 
the same reaction catalyzed by a simple complementary 
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template under the same conditions, and is at least 10 5 - 
fold faster than the untemplated background ligation 
(Fig. 3D) . This rate enhancement is comparable to values 
obtained for ribozymes obtained by in vitro, selection 
5 (Szostak, Trends Biochem. Sci. 17: 89-93 , 1992; Chapman et 
aim, Curr. Opin. Struct. Biol. 4:618-622, 1994; Breaker 
et aim, Trends Biotechnol. 12:268-275, 1994; Joyce, Curr. 
Opin. Struct. Biol. 4:331-336, 1994) and catalytic 
antibodies (Lerner et aim, Science 252:659-667, 1991). 

10 Since the catalyst is not consumed in the 

reaction, it was expected that E47 would be capable of 
catalyzing the ligation of several molar equivalents of 
substrates SI and S2, provided that the ligated product 
is able to dissociate from the enzyme. At saturating 

15 concentrations (140 /xM) of both substrates and 1 tM E47, 
multiple turnover catalysis at a rate of 0.66 hr"" 1 at 25°C 
and 2.4 hr" 1 at 35 °C was observed (10 turnovers observed). 
At these temperatures, product release appears to be rate 
limiting, as a rapid initial burst of approximately one 

20 equivalent of product formation was observed within the 
first 10 minutes of the reaction. The initial rate of 
ligation in this burst phase was directly proportional to 
the concentration of E47 over a 30-fold range, as 
expected for an enzyme at saturating substrate 

25 concentration (Fersht, Enzyme Structure and Mechanism 
(Freeman, New York, 1985)). A plot of K obs vs. [E47] 
yielded a k cat of 3.2 hr" 1 (0.07 min" 1 ) at 25°C. 

Because divalent metal ions play a crucial role in 
ribozymes (Pyle, Science 261:709-714, 1993) and many 

30 protein enzymes (Karlin, Science 261:701-708, 1993), it 
was expected that the DNA catalyst would require either 
Mg 2+ and/ or Zn 2+ for activity, as these ions were present 
in the selection buffer. Indeed, the ligation reaction 
is dependent on Zn 2+ (Fig. 4 A) , but does not require Mg 2+ . 

35 All of the members of the Irving-Williams series (Ba 2+ , 
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Sr 2+ , Ca 2+ , Mg 2+ , Mn 2+ , Fe 2+ , Co 2+ , Ni 2+ , Cu 2+ , Zn 2 +) , as 
well as Pb 2+ and Cd 2+ , were tested at concentrations 
between 10 /iM and 10 mM, and it was found that only Cu 2+ 
could substitute for Zn 2+ . The efficiency of the ligation 
5 reaction is highly dependent on the divalent metal ion 
concentration (Fig, 4B) • Increasing concentrations of 
Zn 2+ up to 4 mM enhanced activity , but at higher 
concentrations the activity dropped sharply , suggesting 
the existence of inhibitory metal binding sites. A 
10 similar concentration dependence was observed for copper, 
but at a 400-fold lower concentration. The metal ion 
specificity suggests the existence of one or more metal 
ion binding sites with stringent geometrical and/or size 
requirements • 

15 To gain insight into the ligation mechanism, the 

pH-rate profile of the reaction under pre-steady-state 
(single turnover) conditions was determined (Fig. 4C) . 
The bell shaped profile displayed with Cu 2+ suggests that 
the rate limiting step of the ligation reaction depends 

20 in part on two ionizable groups, once acidic and one 
basic, raising the possibility of a general acid-base 
mechanism (Fersht, Enzyme Structure and Mechanism 
(Freeman, New York, 1985) ) in which copper complexes are 
involved in proton transfer. Metal-ion hydroxides are 

25 thought to act as general bases in some ribozyme-mediated 
RNA cleavage reactions (Pyle, Science 261:709-714, 1993; 
Dahm et al., Biochemistry 32:13040-13045, 1993; Pan et 
al., Biochemistry 33:9561-9565, 1994). Other 
possibilities, such as pH-dependent folding effects, may 

30 also account for these observations (Kao et al., Proc. 
Natl. Acad. Sci. USA 77:3360-3364, 1980). 

£47 and substrates SI and S2 were modified so that 
ligation of the modified substrates by the modified 
enzyme results in formation of a ligated product having 

35 the sequence of the modified enzyme. The sequences of 
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-three such enzymes (E) , and their corresponding 
substrates (SI and S2) , are as follows: 

I. E: 5'- 

ACCTTCACCTTCTTTCGCTAGACCTTCAAGCGGAAGGTGAAGGT 
5 CTAGCG-3' (SEQ ID NO: 33) 

SI : 5 ' -ACCTTCACCTTCTTTCGCTAGACCTTCAAGC- 3 ' 
(SEQ ID NO: 34) 

S2 : 5 ' -GGAAGGTGAAGGTCTAGCG-3 ' ( SEQ ID NO : 35) 

II. E: 5'- 

1 0 ACCTTCACCTTCTTTCGCTAGACCTTCAAGCGGAAGGTGAAGGT 

CTA-3' (SEQ ID NO: 36) 

SI : 5 ' -ACCTTCACCTTCTTTCGCTAGACCTTCAAGC- 3 ' 
(SEQ ID NO: 34) 

S2: 5 9 — GGAAGGTGAAGGTCTA-3 9 (SEQ ID NO: 37) 

15 III . E : 5 9 — CTTCACCTTCTTTCGCTAGACCTTCAAGCGGAAGGTGAAGGT 

CTA-3' (SEQ ID NO: 38) 

SI : 5 ' — CTTCACCTTCTTTCGCTAG ACCTTCAAGC— 3 9 
(SEQ ID NO: 39) 

S2: 5' -GGAAGGTGAAGGTCTA-3' (SEQ ID NO: 37) 

20 The differences between these enzymes and E47 are in (1) 
the stem formed between E47 and the 5 ' -hydroxy 1- 
containing substrate S2, (2) the stem formed between E47 
and the activated substrate SI, (3) the intramolecular 
stem in E47, and (4) the loop in E47. The sequence of 

25 the presumed core of the ligation site was not changed. 
The modified enzymes differ from one another only in the 
number of base pairs between the enzyme and the 
substrates. The modified enzymes catalyze ligation of 
their respective substrates, which shows that the primary 

30 nucleotide sequences of at least some parts of the stem 
and loop structures depicted in Fig. 3B are not required 
for enzyme activity, and further that the unchanged 
regions of the enzyme are sufficient for maintenance of 
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ligase activity, in the presence of the stem structures 
defined by S 1 -E 1 and S 4 -E 6 . 
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SEQUENCE LISTING 



(1) GENERAL INFORMATION: 

(i) APPLICANT: The General Hospital Corporation 
(ii) TITLE OF INVENTION: CATALYTIC DNA 
(iii) NUMBER OF SEQUENCES : 39 

(iv) CORRESPONDENCE ADDRESS: 

(A) ADDRESSEE: Fish & Richardson P.C. 

(B) STREET: 225 Franklin Street, 

(C) CITY: Boston 

(D) STATE: MA 

(E) COUNTRY: USA 

(F) ZIP: 02110-2804 

(V) COMPUTER READABLE FORM: 

(A) MEDIUM TYPE: Floppy disk 

(B) COMPUTER: IBM PC compatible 

(C) OPERATING SYSTEM: PC-DOS/MS-DOS 

(D) SOFTWARE: Patent In Release #1.0, Version #1.30 

(vii) PRIOR APPLICATION DATA: 

(A) APPLICATION NUMBER: 08/487,867 

(B) FILING DATE: 07-JUN-1995 

(C) CLASSIFICATION: 

(viii) ATTORNEY /AGENT INFORMATION: 

(A) NAME: Lech, Karen F. 

(B) REGISTRATION NUMBER: 35,238 

(C) REFERENCE /DOCKET NUMBER: 00786/273001 

(ix) TELECOMMUNICATION INFORMATION: 

(A) TELEPHONE: (617) 542-5070 

(B) TELEFAX: (617) 542-8906 

(C) TELEX: 200154 



(2) INFORMATION FOR SEQ ID NO:l: 

<i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 115 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO:l: 
TATGTGTCGA TTGTGTTCTT TCGCTAGACC ATGTGAGACT TATGCTTCGA ATTGTCGAGT 60 
TTTTGACTGT TTGCTTGGCC GGCTGGTGGT CGTGCATGGT GAGATGATTA CCCTA 115 
(2) INFORMATION FOR SEQ ID NO: 2: 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 115 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 2: 
TATGTGTCGA TTGTGTTCTT TCGCTAGACC ATGTGGGACT TATGCTTCGA ATTGTCGAGT 60 
TTTTGACTGT TTGCTTGGCT GGCTGGTGGC CGCGCATGGT GAGATGATTA TCCCT 115 
(2) INFORMATION FOR SEQ ID NO: 3: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 116 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xl) SEQUENCE DESCRIPTION: SEQ ID NO: 3: 
TATGTGTCGA TTGTGTTCTT TCGCTAGACC ATGTGAGACT TATGCTTCGA ATTGTCGAGT 60 
TTTTGACTGT TTGCTTGGCC GGCTGGTGGT CGCGCATGGT GAGATGATTA TCCCTA 116 
(2) INFORMATION FOR SEQ ID NO: 4: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 117 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 4: 
TATGTGTCGA TTGTGTTCTT CCGCTAGACC ATGTGAGACT TATGCTTCGA ATTGTCGAGT 60 
TTTTGACTGT TTGCTTGGCC GGCTGGTGGT CGCGCATGGT GAGATGATTA TTCCCTG 117 
(2) INFORMATION FOR SEQ ID NO: 5: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 116 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 5: 
TATAGTCAGG CTGGTAGGGT TCTTTCGCAG AGTGCGATGT GTTTTGATTT GAACTTATTT 60 
ATGAGGTCTG TTGAAGCCCA TTGCGACTGA GTGCTTGCTG CTTGTTACTT TCCCTT 116 
(2) INFORMATION FOR SEQ ID NO: 6: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 116 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(il) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 6: 
TATAGTCAGG CTGGTAGGGT TCTTTCGCAG AGTGCGATGT GTTTTGATTT GAACTTATTT 60 
ATGAGGTCTG TTGAAGCCCA TTGCGACTGA GTGCTTGCTG CTTGTTACTT TCCCAT 116 
(2) INFORMATION FOR SEQ ID NO: 7: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 116 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 7: 
TATAGTCAGG CTGGTAGGGT TCTTTCGCAG AGTGCGATGT GTTTTGATTT GAACTTATTT 60 
ATGAGGTCTG TTGAAGCCCA TTGCGACTGA GTGCTTGCGG CTTGTTACTT TCCCAT 116 
(2) INFORMATION FOR SEQ ID NO: 8: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 116 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 8: 
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TATAGTCAGG CTGGTAGGGT TCTTTCGCAG AGTGCGATGT GTTTTGATTT GAACTTATTT 60 
ATGAGGTCGG TTGAAGCTCA TTGCGACTGA GTGCTTGCTG CTTGTTACTT TCCCAC 116 
(2) INFORMATION FOR SEQ ID NO: 9: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 116 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 9: 
CGTTTCGTTT TGGAAGGCCT GTTGGTCCTT GTGTTCTCTC GCAGACCACT TTTTCGTACA 60 
CGGAAGTGGA TTAAGTGGTG AGTTGCTTTC TAGTATGCGC TTTGAGGTAT TCTATG 116 
(2) INFORMATION FOR SEQ ID NO: 10: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 116 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 10: 
CGTTTCGATT TGGAAGGCCT GTTGGTCCTT GTGTTCTCTC GCAGACCACT TTTTCGTTCA 60 
CGGAAGTGGA ATAAGTGGTG AGTTGCTTTC TAGTGTGCGC TTTGAGGTAT TCTATG 116 
(2) INFORMATION FOR SEQ ID NO: 11: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 116 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 11: 
CGTTTCGTTT TGGAAGGCCT GTTGGTCCTT GTGTTCTCTC GCAGACCACT TTTTCGTTCA 60 
CGGAAGTGGA TTAAGTGGTG AGTTGCTTTC TAGTGTGCGC TTTGAGGAAT TCTATG 116 
(2) INFORMATION FOR SEQ ID NO: 12: 
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(i) SEQUENCE CHARACTERISTICS : 

(A) LENGTH: 116 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 12: 
CGTCTTGCTG GGTTTTTGCT CGGTATCGTT CTTTCGCTAG ACCTTTAAAT AATGGTGAGA 60 
TGCTGTTTTT GAGGCTAGTA GCGCGGGATT GGGCGTTACC GTCGTTTGTC TTTCGA 116 
(2) INFORMATION FOR SEQ ID NO: 13: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 115 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 13: 
CACGTACTTC TTGTAGACGT GTGGCTTTGA TAGGATGTGG TCTTTCGCTA GAGTTAATTA 60 
GCTGTGGACC CTTAAGGTGT CTTAACTGAG ATGCTTTCAT TTTGTCTTTC TGATT 115 
(2) INFORMATION FOR SEQ ID NO: 14: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 116 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 14: 
GAGCGTGGCT AACTGGATAG TGGTCTCTCG CTAGACACCT GTGTGAGATT GTTAGAATGC 60 
GGTCCATCTG CCTATTTGGT AGTTAAGGGT TTATGCTGTT CCTCTGATCA CTTTCG 116 
(2) INFORMATION FOR SEQ ID NO: 15: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 115 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 
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(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 15: 
GTTTTTGTGT TTGACGAATA CGTGTTCTTT CGCAGACCTT GTGCATCTTT GTTGTCGCAA 60 
GGTGAGATGC TTGTGTTGTT TGCTTTTTCA TGTTTGCTTG TCCTTGTTTT TAAAC 115 
(2) INFORMATION FOR SEQ ID NO: 16: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 116 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS z single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 16: 
TTGTGGTTGT GACCGGTTAG GATAGTGTTA TTTCGCAGAC CACATCACCG TATTTTGGTG 60 
AGTGGTGAGA TGCTGCTATT TTGTGGTGTT GCACCCGCTT AAATACTTCG AGGTTT 116 
(2) INFORMATION FOR SEQ ID NO: 17: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 116 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDED NESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 17: 
TTTGGTTTCG CAGTTGGTGT GTTCGTTCGC AGACCCTTTG GGTGAGATTG CTTTTGCGGC 60 
TTTGAGTGAT CCTGCCTTGT GGTATTGTTG TGCATGTGAT AG CTTGTTCT GCTCAT 116 
(2) INFORMATION FOR SEQ ID NO: IS: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 114 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 18: 
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TGGGGATCGC GGTATTAGTG TGTGCGTACT TTGGCTGACG GTGGCCGTCG TGGTATGTCT 60 
GTTCTGTCGC ATGATCCAAT CTTCCCGGTT GGATGAGATG CTTGATTATG CTTA 114 
(2) INFORMATION FOR SEQ ID NO: 19: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 117 base pairs 

(B) TYPE: nucleic acid 

(C) STRAND ED NESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 19: 
TTTCTTGGGC TTAAGCTCGG TTATTGTTCT TTCGCTAGAT CCATGTCTAT ATTATGGTTG 60 
GGCCGACTGG TTTTTTACTT ATACTATTGT TTTTGTGGCG TGGATGAGAT GCTGTTT 117 
(2) INFORMATION FOR SEQ ID NO: 20: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 116 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 20: 
TCAGGTGTTT TTGTTTTTCT GAGCAGGGAG TCGGTGTGTT CTTTCGCAGA CACGAGTTTT 60 
TTGTGTGAGA TTGCTTAGTG TTCTTTGTTC AATCACTAGA TTTCTTGATG GGTGTG 116 
(2) INFORMATION FOR SEQ ID NO: 21: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 115 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 

(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 21: 
GTCGGTTCAT GTTGTTCTTT CGCCAGATGA TCGCGGCGTT TTAGTTTACG TCACTCGACG 60 
TATTTTCTAC GGGGTTTAGG CTTTGTCGAT CATGAGTTGC TTAGATTGAT TTTTT 115 
(2) INFORMATION FOR SEQ ID NO: 22: 
(i) SEQUENCE CHARACTERISTICS: 
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(A) LENGTH: 27 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 22: 
CGGATAGTGT TCTTTCGCTA GANNNNN 
(2) INFORMATION FOR SEQ ID NO: 23: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 15 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 23: 
NNNNNTGAGA TGCTT 

(2) INFORMATION. FOR SEQ ID NO: 24: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 24: 
AAGCATCTCA AGC 

(2) INFORMATION FOR SEQ ID NO: 25: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 13 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 25: 



WO 96/40723 



29 



GGAACACTAT CCG 

(2) INFORMATION FOR SEQ ID NO: 26: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 26: 
CGGATAGTGT TCTTTCGCTA GANNNN 
(2) INFORMATION FOR SEQ ID NO: 27: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 14 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 27: 
NNNNTGAGAT GCTT 

(2) INFORMATION FOR SEQ ID NO: 28: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 28: 
CGGATAGTGT TCTTTCGCTA GACCATGTGA CGCATGGTGA GATGCTT 
(2) INFORMATION FOR SEQ ID NO: 29: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 22 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 
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(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 29s 
GGAACACTAT CCGACTGGCA CC 
(2) INFORMATION FOR SEQ ID NO: 30: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 20 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 30: 
CGGGATCCTA ATGACCAAGG 
(2) INFORMATION FOR SEQ ID NO: 31: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 21 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 31: 
AAGCATCTAA GCATCTCAAG C 
(2) INFORMATION FOR SEQ ID NO: 32: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 26 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 32: 
CGGATAGTGT TCCGCTTGAG ATGCTT 
(2) INFORMATION FOR SEQ ID NO: 33: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 50 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 
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(D) TOPOLOGY: linear 
(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 33: 
ACCTTCACCT TCTTTCGCTA GACCTTCAAG CGGAAGGTGA AGGTCTAGCG 50 
(2) INFORMATION FOR SEQ ID NO: 34: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 31 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS : single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 34: 
ACCTTCACCT TCTTTCGCTA GACCTTCAAG C 31 
(2) INFORMATION FOR SEQ ID NO: 35: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 19 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 35: 
GGAAGGTGAA GGTCTAGCG 19 
(2) INFORMATION FOR SEQ ID NO: 36: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 47 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 36: 
ACCTTCACCT TCTTTCGCTA GACCTTCAAG CGGAAGGTGA AGGTCTA 
(2) INFORMATION FOR SEQ ID NO: 37: 



47 
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(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 16 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY : linear 

(ii) MOLECULE TYPE: DNA 



(Xi) SEQUENCE DESCRIPTION: SEQ ID NO: 37: 
GGAAGGTGAA GGTCTA 16 
(2) INFORMATION FOR SEQ ID NO: 38: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 45 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 38: 
CTTCACCTTC TTTCGCTAGA CCTTCAAGCG GAAGGTGAAG GTCTA 45 
(2) INFORMATION FOR SEQ ID NO: 39: 

(i) SEQUENCE CHARACTERISTICS: 

(A) LENGTH: 29 base pairs 

(B) TYPE: nucleic acid 

(C) STRANDEDNESS: single 

(D) TOPOLOGY: linear 

(ii) MOLECULE TYPE: DNA 



(xi) SEQUENCE DESCRIPTION: SEQ ID NO: 39: 
CTTCACCTTC TTTCGCTAGA CCTTCAAGC 29 



What is claimed is: 
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CLAIMS 

1. A method for obtaining a nucleic acid molecule 
having ligase activity, said method comprising the steps 
of: 

5 a) providing a population of candidate nucleic 

acid molecules, each having a region of random sequence; 

b) contacting said population with: 

(i) a substrate nucleic acid molecule; and 

(ii) an external template complementary to a 
10 portion of the 3' region of said substrate nucleic 

acid molecule and a portion of the 5' region of 
each of the candidate nucleic acid molecules in 
said population, wherein binding of said external 
template to said substrate nucleic acid molecule 
15 and a candidate nucleic acid molecule from said 

population juxtaposes said 3' and 5' regions, and 
the terminal nucleotide of either said 3' or said 
5' region contains an activated group; 

c) isolating a subpopulation of nucleic acid 
20 molecules having ligase activity from said population; 

d) amplifying said subpopulation in vitro; 

e) optionally repeating steps b-d for said 
amplified subpopulation; and 

f ) isolating said nucleic acid molecule having 
25 ligase activity from said amplified subpopulation. 



2. The method of claim 1, wherein said optional 
repeating of steps b-d is carried out in the absence of 
said external template. 

3. The method of claim 1, wherein said nucleic 
30 acid molecule having ligase activity or said substrate 

nucleic acid molecule is DNA. 
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4. The method of claim 1, wherein the 5' terminal 
nucleotide of said substrate nucleic acid contains a 
biotin moiety. 

5. The method of claim 1, wherein said activated 
5 group is a 3'-phosphorimidazolide on the 3' terminal 

nucleotide of said substrate. 

6. A method for obtaining a DNA molecule having 
ligase activity, said method comprising the steps of: 

a) providing a population of candidate DNA 
10 molecules, each having a region of random sequence; 

b) contacting said population with a substrate 
nucleic acid molecule; 

c) isolating a subpopulation of DNA molecules 
having ligase activity from said population; 

15 d) amplifying said subpopulation in vitro; 

e) optionally repeating steps b-d for said 
amplified subpopulation; and 

f ) isolating said DNA molecule having ligase 
activity from said amplified subpopulation. 

20 7. The method of claim 6, wherein said substrate 

nucleic acid molecule is DNA. 

8. The method of claim 6, wherein the 5' terminal 
nucleotide of said substrate nucleic acid contains a 
biotin moiety. 

25 9. The method of claim 6, wherein said activated 

group is a 3 '-phosphorimidazolide on the 3' terminal 
nucleotide of said substrate. 



10. A DNA molecule capable of acting as a 
catalyst. 



WO 96/40723 



PCT/US96/09358 



- 35 - 

11. A DNA molecule capable of acting as a 
catalyst on a nucleic acid substrate, said catalysis not 
requiring the presence of a ribonucleotide in said 
nucleic acid substrate. 

5 12. A nucleic acid molecule having ligase 

activity. 

13. The nucleic acid molecule of claim 12 , 
wherein said nucleic acid molecule is DNA. 

14. The nucleic acid molecule of claim 12 , 
10 wherein said ligase activity is DNA ligase activity. 

15. A nucleic acid molecule capable of ligating a 
first substrate nucleic acid to a second substrate 
nucleic acid, wherein the rate of said ligating is 
greater than the rate of ligating said substrate nucleic 

15 acids by templating under the same reaction conditions. 
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16. A catalytic DNA molecule capable of ligating 
a first substrate nucleic acid to a second substrate 
nucleic acid, said first substrate nucleic acid 
comprising the sequence 3 / -S 1 -S 2 -5 f # said second substrate 
nucleic acid comprising the sequence 3'-S 3 -S 4 -5', and said 
catalytic DNA molecule comprising the sequence s'-E^TTT- 
E 2 -AGA-E 3 -E 4 -E 5 -E 6 -3 ' , wherein 

5 1 comprises at least two nucleotides positioned 
adjacent to the 3' end of S 2 , said S 1 nucleotides being 
complementary to an equivalent number of nucleotides in 
E 1 that are positioned adjacent to the 5' end of said 
TTT; 

5 2 comprises one - three nucleotides, S 3 comprises 
one - six- nucleotides, and the 5' terminal nucleotide of 
S 2 and the 3 ' terminal nucleotide of S 3 alternatively 
contain an activated group or a hydroxyl group; 

S 4 comprises at least two nucleotides positioned 
adjacent to the 5' end of S 3 , said S 4 nucleotides being 
complementary to an equivalent number of nucleotides in 
E 6 that are positioned adjacent to the 3' end of E 5 ; 

E 1 comprises at least two nucleotides positioned 
adjacent to the 5' end of said TTT, said E 1 nucleotides 
being complementary to an equivalent number of 
nucleotides in S 1 that are positioned adjacent to the 3' 
end of S 2 ; 

E 2 comprises zero - twelve nucleotides; 

E 3 comprises at least two nucleotides positioned 
adjacent to the 3' end of said AGA, said E 3 nucleotides 
being complementary to an equivalent number of 
nucleotides in E 5 that are positioned adjacent to the 5' 
end of E 6 ; 

E 4 comprises 3-200 nucleotides; 

E 5 comprises at least two nucleotides positioned 
adjacent to the 5' end of E 6 , said E 5 nucleotides being 
complementary to an equivalent number of nucleotides in 
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E 3 that are positioned adjacent to the 3' end of said 
AG A; and 

E 6 comprises at least two nucleotides positioned 
adjacent to the 3' end of E 5 , said E 6 nucleotides being 
5 complementary to an equivalent number of nucleotides in 
S 4 that are positioned adjacent to the 5' end of S 3 . 

17. The catalytic DNA molecule of claim 16 , 
wherein E 2 comprises three - four nucleotides. 

18. The catalytic DNA molecule of claim 17, 

10 wherein the 5' most nucleotide of S 2 is complementary to 
the 5' most nucleotide of E 2 ; the 3 ' most nucleotide of S 
is complementary to the second 5' most nucleotide of E 2 ; 
and the second 3' most nucleotide of S 3 is complementary 
to the third 5' most nucleotide of E 2 . 



15 19. The catalytic DNA molecule of claim 18, 

wherein E 2 comprises four nucleotides, and the third 3' 
most nucleotide of S 3 is complementary to the fourth 5' 
most nucleotide of E 2 . 



20. The catalytic DNA molecule of claim 16, 

20 wherein 

a) S 2 comprises one nucleotide; 

b) S 3 comprises three nucleotides; 

c) E 4 comprises five nucleotides; or 

d) E 5 and E 3 each comprise five nucleotides. 



25 21. A method of ligating a first nucleic acid 

molecule to a second nucleic acid molecule, said method 
comprising contacting said first and said second nucleic 
acid molecules with a nucleic acid molecule having ligase 
activity. 
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22. The method of claim 21, wherein said nucleic 
acid molecule having ligase activity is DNA. 

23. The method of claim 21, wherein said ligase 
activity is DNA ligase activity. 



24. A nucleic acid molecule having ligase 
activity obtained by the steps of: 

a) providing a population of candidate nucleic 
acid molecules, each having a region of random sequence; 

b) contacting said population with: 

(i) a substrate nucleic acid molecule; and 

(ii) an external template complementary to a 
portion of the 3' region of said substrate nucleic 
acid molecule and a portion of the 5' region of 
each of the candidate nucleic acid molecules from 
said population, wherein binding of said external 
template to said substrate nucleic acid molecule 
and a candidate nucleic acid molecule in said 
population juxtaposes said 3' and 5' regions, and 
the terminal nucleotide of either said 3' or said 
5' region contains an activated group; 

c) isolating a subpopulation of nucleic acid 
molecules having ligase activity from said population; 

d) amplifying said subpopulation in vitro; 

e) optionally repeating steps b-d for said 
amplified subpopulation; and 

f ) isolating said nucleic acid molecule having 
ligase activity from said amplified subpopulation. 



25. The nucleic acid of claim 24, wherein said 
optional repeating of steps b-d is carried out in the 
absence of said external template. 
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26. The nucleic acid molecule having ligase 
activity of claim 24 , wherein said nucleic acid molecule 
having ligase activity is DNA. 

27. The nucleic acid molecule having ligase 
5 activity of claim 24, wherein 

a) the 5' terminal nucleotide of said substrate 
nucleic acid contains a biotin moiety; or 

b) said activated group is a 3'- 
phosphorimidazolide on the 3' terminal nucleotide of said 

10 substrate. 



28. A DNA molecule having ligase activity 
obtained by the steps of: 

a) providing a population of candidate DNA 
molecules, each having a region of random sequence; 
15 b) contacting said population with a substrate 

nucleic acid molecule; 

c) isolating a subpopulation of DNA molecules 
having ligase activity from said population; 

d) amplifying said subpopulation in vitro; 
20 e) optionally repeating steps b-d for said 

amplified subpopulation; and 

f ) isolating said DNA molecule having ligase 
activity from said amplified subpopulation. 

29. The DNA molecule having ligase activity of 
25 claim 28 , wherein 

a) the 5' terminal nucleotide of said substrate 
nucleic acid contains a biotin moiety; or 

b) said activated group is a 3'- 
phosphorimidazolide on the 3' terminal nucleotide of said 

30 substrate. 
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